library(tidyverse)
[37m── [1mAttaching packages[22m ───────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──[39m
[37m[32m✔[37m [34mggplot2[37m 3.1.0 [32m✔[37m [34mpurrr [37m 0.2.5
[32m✔[37m [34mtibble [37m 2.0.1 [32m✔[37m [34mdplyr [37m 0.7.8
[32m✔[37m [34mtidyr [37m 0.8.2 [32m✔[37m [34mstringr[37m 1.3.1
[32m✔[37m [34mreadr [37m 1.3.1 [32m✔[37m [34mforcats[37m 0.3.0[39m
package ‘tibble’ was built under R version 3.5.2[37m── [1mConflicts[22m ──────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[37m [34mdplyr[37m::[32mfilter()[37m masks [34mstats[37m::filter()
[31m✖[37m [34mdplyr[37m::[32mlag()[37m masks [34mstats[37m::lag()[39m
library(here)
here() starts at /Users/scottericr/Documents/Tufts/Research Projects/PLS Oecologia
library(neonUtilities)
After chatting with Katie at NEON, we decided a good place to start would be with small mammal data and vegetation data. She recommended three data products: the small mammal trapping data which is capture-recapture data collected monthly and includes morphology measures and tick counts; the plant presence and percent cover data product which includes presence and percent cover of different plant species; and Woody Plant Vegetation Structure which includes morphological measures of woody plants
Let’s start with woody plants. This dataset is measured every three years, so I’ll get three years of data from a bunch of sites and collapse it down somehow.
stackByTable(here("data", "NEON", "NEON_presence-cover-plant.zip")) # only need to run once
Unpacking zip files
|
| | 0%
|
|== | 2%
|
|================================================================================================| 100%
Stacking table div_10m2Data100m2Data
|
| | 0%
|
|== | 2%
|
|=== | 3%
|
|===== | 5%
|
|====== | 6%
|
|======== | 8%
|
|========= | 10%
|
|=========== | 11%
|
|============ | 13%
|
|============== | 15%
|
|=============== | 16%
|
|================= | 18%
|
|=================== | 19%
|
|==================== | 21%
|
|====================== | 23%
|
|======================= | 24%
|
|========================= | 26%
|
|========================== | 27%
|
|============================ | 29%
|
|============================= | 31%
|
|=============================== | 32%
|
|================================= | 34%
|
|================================== | 35%
|
|==================================== | 37%
|
|===================================== | 39%
|
|======================================= | 40%
|
|======================================== | 42%
|
|========================================== | 44%
|
|=========================================== | 45%
|
|============================================= | 47%
|
|============================================== | 48%
|
|================================================ | 50%
|
|================================================== | 52%
|
|=================================================== | 53%
|
|===================================================== | 55%
|
|====================================================== | 56%
|
|======================================================== | 58%
|
|========================================================= | 60%
|
|=========================================================== | 61%
|
|============================================================ | 63%
|
|============================================================== | 65%
|
|=============================================================== | 66%
|
|================================================================= | 68%
|
|=================================================================== | 69%
|
|==================================================================== | 71%
|
|====================================================================== | 73%
|
|======================================================================= | 74%
|
|========================================================================= | 76%
|
|========================================================================== | 77%
|
|============================================================================ | 79%
|
|============================================================================= | 81%
|
|=============================================================================== | 82%
|
|================================================================================= | 84%
|
|================================================================================== | 85%
|
|==================================================================================== | 87%
|
|===================================================================================== | 89%
|
|======================================================================================= | 90%
|
|======================================================================================== | 92%
|
|========================================================================================== | 94%
|
|=========================================================================================== | 95%
|
|============================================================================================= | 97%
|
|============================================================================================== | 98%
|
|================================================================================================| 100%
Stacking table div_1m2Data
|
| | 0%
|
|== | 2%
|
|====== | 6%
|
|========= | 10%
|
|============ | 13%
|
|=============== | 16%
|
|================== | 19%
|
|===================== | 22%
|
|======================== | 25%
|
|=========================== | 29%
|
|============================== | 32%
|
|================================== | 35%
|
|===================================== | 38%
|
|======================================== | 41%
|
|=========================================== | 44%
|
|============================================== | 48%
|
|================================================= | 51%
|
|==================================================== | 54%
|
|======================================================= | 57%
|
|========================================================== | 60%
|
|============================================================= | 63%
|
|================================================================ | 67%
|
|=================================================================== | 70%
|
|====================================================================== | 73%
|
|========================================================================= | 76%
|
|============================================================================ | 79%
|
|=============================================================================== | 83%
|
|================================================================================== | 86%
|
|===================================================================================== | 89%
|
|======================================================================================== | 92%
|
|=========================================================================================== | 95%
|
|============================================================================================== | 98%
|
|================================================================================================| 100%
Finished: All of the data are stacked into 2 tables!
Copied the first available variable definition file to /stackedFiles and renamed as variables.csv
Copied the first available validation file to /stackedFiles and renamed as validation.csv
Stacked div_10m2Data100m2Data which has 71375 out of the expected 71375 rows (100%).
Stacked div_1m2Data which has 74736 out of the expected 74736 rows (100%).
Stacking took 13.91035 secs
All unzipped monthly data folders have been removed.
Hmm, not useful on its own really
I’m using the 1 meter square data because that’s what Kate suggested. I’m not sure what the other file is
So cover is either a focal plant species (taxonID) or an “other variabe” (otherVariables) such as litter, wood, moss, bare soil. I should combine these columns (after checking that its one or the other always) and then spread them.
Let’s check how many unique plots this is
313 plots. Ok, not so HUGE, but pretty big. WOuld be nice to get this number under 100 while maintaining a big range in elevation and lattitude. Let’s see what I can figure out about each site
Ok, looks like the first 4 sites have a big range in elevation, so I’ll just use those. At some point, I should figure out where these are.
cover.wide <-
cover2 %>%
select(namedLocation, siteID, decimalLatitude, decimalLongitude, elevation, plotID, subplotID, endDate, taxonID, otherVariables, percentCover) %>%
mutate(variable = case_when(!is.na(taxonID) ~ taxonID,
!is.na(otherVariables) ~ otherVariables,
TRUE ~ as.character(NA))) %>%
select(-taxonID, -otherVariables) %>%
group_by(namedLocation, variable) %>%
summarize(meanElevation = mean(elevation, na.rm = TRUE),
meanLatitude = mean(decimalLatitude, na.rm = TRUE),
meanLongitude = mean(decimalLongitude, na.rm = TRUE)
meanPercentCover = mean(percentCover, na.rm = TRUE)) %>%
Error: unexpected symbol in:
" meanLongitude = mean(decimalLongitude, na.rm = TRUE)
meanPercentCover"
Oh, that’s a lot of NAs. Let’s get rid of some columns?
Hmm, maybe just one site? Maybe HARV?
#try pca and plsr
meta <- c("siteID", "namedLocation", "meanElevation", "meanLatitude", "meanLongitude")
test.pca <- opls(select(test, -meta))
test.pls <- opls(select(test, -meta), test$siteID)
har.pls <- opls(select(harv, -meta), harv$meanElevation)
Error in if (modelDF[hN, "R2Y"] < 0.01) { :
missing value where TRUE/FALSE needed
Not sure what this error message means, but it’s clearly triggering code that happens when R2Y is less than 0.01, so very bad model.
opls(X, harv.litter$litter)
Error in if (modelDF[hN, "R2Y"] < 0.01) { :
missing value where TRUE/FALSE needed
NOPE